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It is always a pleasure to hear Brad Efron's thoughts 
on the next century of statistics, especially consid- 
ering the huge influence he has had on the field's 
present state and future directions, both in model- 
based and nonparametric inference. 

THREE M ETA-PRINCIPLES OF STATISTICS 

Before going on, I would like to state three meta- 
principles of statistics which I think are relevant to 
the current discussion. 

First, the information principle, which is that the 
key to a good statistical method is not its underlying 
philosophy or mathematical reasoning, but rather 
what information the method allows us to use. Good 
methods make use of more information. This can 
come in different ways: in my own experience (fol- 
lowing the lead of Efron and Morris, 1971, among 
others), hierarchical Bayes allows us to combine dif- 
ferent data sources and weight them appropriately 
using partial pooling. Other statisticians find para- 
metric Bayes too restrictive: in practice, paramet- 
ric modeling typically comes down to conventional 
models such as the normal and gamma distributions, 
and the resulting inference does not take advantage 
of distributional information beyond the first two 
moments of the data. Such problems motivate more 
elaborate models, which raise new concerns about 
overfitting, and so on. 

As in many areas of mathematics, theory and prac- 
tice leapfrog each other: as Efron notes, empirical 
Bayes methods have made great practical advances 
but "have yet to form into a coherent theory." In 
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the past few decades, however, with the work of 
Lindley and Smith (1972) and many others, empir- 
ical Bayes has been folded into hierarchical Bayes, 
which is part of a coherent theory that includes in- 
ference, model checking, and data collection (at least 
in my own view, as represented in chapters 6 and 
7 of Gelman et al., 2003). Other times, theoretical 
and even computational advances lead to practical 
breakthroughs, as Efron illustrates in his discussion 
of the progress made in genetic analysis following 
the Benjamini and Hochberg paper on false discov- 
ery rates. 

My second meta-principle of statistics is the method- 
ological attribution problem, which is that the many 
useful contributions of a good statistical consultant, 
or collaborator, will often be attributed to the statis- 
tician's methods or philosophy rather than to the 
artful efforts of the statistician himself or herself. 
Don Rubin has told me that scientists are funda- 
mentally Bayesian (even if they do not realize it), in 
that they interpret uncertainty intervals Bayesianly. 
Brad Efron has talked vividly about how his sci- 
entific collaborators find permutation tests and p- 
values to be the most convincing form of evidence. 
Judea Pearl assures me that graphical models de- 
scribe how people really think about causality. And 
so on. I am sure that all these accomplished re- 
searchers, and many more, are describing their ex- 
periences accurately. Rubin wielding a posterior dis- 
tribution is a powerful thing, as is Efron with a per- 
mutation test or Pearl with a graphical model, and I 
believe that (a) all three can be helping people solve 
real scientific problems, and (b) it is natural for their 
collaborators to attribute some of these researchers' 
creativity to their methods. 

The result is that each of us tends to come away 
from a collaboration or consulting experience with 
the warm feeling that our methods really work, and 
that they represent how scientists really think. In 
stating this, I am not trying to espouse some sort 
of empty pluralism — the claim that, for example, 
we would be doing just as well if we were all using 
fuzzy sets, or correspondence analysis, or some other 
obscure statistical method. There is certainly a rea- 
son that methodological advances are made, and this 
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reason is typically that existing methods have their 
failings. Nonetheless, I think we all have to be careful 
about attributing too much from our collaborators' 
and clients' satisfaction with our methods. 

My third meta-principle is that different applica- 
tions demand different philosophies. This principle 
comes up for me in Efron's discussion of hypothesis 
testing and the so-called false discovery rate, which 
I label as "so-called" for the following reason. In 
Efron's formulation (which follows the classical mul- 
tiple comparisons literature), a "false discovery" is 
a zero effect that is identified as nonzero, whereas, 
in my own work, I never study zero effects. The ef- 
fects I study are sometimes small but it would be 
silly, for example, to suppose that the difference in 
voting patterns of men and women (after control- 
ling for some other variables) could be exactly zero. 
My problems with the "false discovery" formulation 
are partly a matter of taste, I'm sure, but I believe 
they also arise from the difference between problems 
in genetics (in which some genes really have essen- 
tially zero effects on some traits, so that the classical 
hypothesis-testing model is plausible) and in social 
science and environmental health (where essentially 
everything is connected to everything else, and effect 
sizes follow a continuous distribution rather than a 
mix of large effects and near-exact zeroes) . 

To me, the false discovery rate is the latest flavor- 
of-the- month attempt to make the Bayesian omelette 
without breaking the eggs. As such, it can work fine 
if the implicit prior is ok, it can be a great method, 
but I really don't like it as an underlying principle, 
as it is all formally based on a hypothesis-testing 
framework that, to me, is more trouble than it's 
worth. In thinking about multiple comparisons in 
my own research, I prefer to discuss errors of Type 
S and Type M rather than Type 1 and Type 2 (Gel- 
man and Tuerlinckx, 2000; Gelman and Weakliem, 
2009; Gelman, Hill and Yajima, 2009). My point 
here, though, is simply that any given statistical 
concept will make more sense in some settings than 
others. 

For another example of how different areas of ap- 
plication merit different sorts of statistical thinking, 
consider Rob Kass's remark: "I tell my students in 
neurobiology that in claiming statistical significance 
I get nervous unless the p- value is much smaller than 
0.01." In political science, we are typically not aim- 
ing for that level of uncertainty. (Just to get a sense 
of the scale of things, there have been barely 100 
national elections in all of U.S. history, and political 



scientists studying the modern era typically start in 
1946.) 

PROGRESS IN PARAMETRIC BAYESIAN 
INFERENCE 

I also think that Efron is doing parametric Bayesian 
inference a disservice by focusing on a fun little base- 
ball example that he and Morris worked on 35 years 
ago. If he would look at what is being done now, 
he would see all the good statistical practice that, 
in his section 10, he naively (I think) attributes to 
"frequentism." Figure 1 illustrates with a grid of 
maps of public opinion by state, estimated from na- 
tional survey data. Fitting this model took a lot of 
effort which was made possible by working within 
a hierarchical regression framework — "a good set of 
work rules," to use Efron's expression. Similar mod- 
els have been used recently to study opinion trends 
in other areas such as gay rights in which policy is 
made at the state level, and so we want to under- 
stand opinions by state as well (Lax and Phillips, 
2009). 

I also completely disagree with Efron's claim that 
frequentism (whatever that is) is "fundamentally con- 
servative." One thing that "frequentism" absolutely 
encourages is for people to use horrible, noisy es- 
timates out of a fear of "bias." More generally, as 
discussed by Gelman and Jakulin (2007), Bayesian 
inference is conservative in that it goes with what is 
already known, unless the new data force a change. 
In contrast, unbiased estimates and other unregu- 
larized classical procedures are noisy and get jerked 
around by whatever data happen to come by — not 
really a conservative thing at all. To make this ar- 
gument more formal, consider the multiple compar- 
isons problem. Classical unbiased comparisons are 
noisy and must be adjusted to avoid overinterpre- 
tation; in contrast, hierarchical Bayes estimates of 
comparisons are conservative (when two parameters 
are pulled toward a common mean, their difference 
is pulled toward zero) and less likely to appear to 
be statistically significant (Gelman and Tuerlinckx, 
2000). 

Another way to understand this is to consider 
the "machine learning" problem of estimating the 
probability of an event on which we have very lit- 
tle direct data. The most conservative stance is to 
assign a probability of ^; the next-conservative ap- 
proach might be to use some highly smoothed es- 
timate based on averaging a large amount of data; 



COMMENT 



3 



2000: Do you support school vouchers? 

Income under S20.000 $20-40,000 $40-75,000 $75-150,000 



Over $150,000 



All voters 



White 
Catholics 



iP *7 




* ■ Iff 

1 ^ 1 V 



White evangelical 
Protestants 



White non-evang. 
Protestants 



White other/ 
no religion 



Blacks 



Hispanics 



Other races 

















2 



3r 



■ 



20% 



45% 



70% 



The state is left blank where a category represents less than 1% of the voters of a state 



Fig. 1. Estimated proportion of voters in each state who support federal spending on school vouchers, broken down by 
religion/ ethnicity and income categories. The estimates come from a hierarchical Bayesian analysis fit to data from the 
National Annenberg Election Survey, adjusted to population and voter turnout data from the U.S. Census. 



and the unbiased estimate based on the local data 
is hardly conservative at all! Figure 1 illustrates our 
conservative estimate of public opinion on school 
vouchers. We prefer this to a noisy, implausible map 
of unbiased estimators. 

Of course, frequentism is a big tent and can be 
interpreted to include all sorts of estimates, up to 



and including whatever Bayesian thing I happen to 
be doing this week — to make any estimate "frequen- 
tist," one just needs to do whatever combination of 
theory and simulation is necessary to get a sense of 
my method's performance under repeated sampling. 
So maybe Efron and I are in agreement in practice, 
that any method is worth considering if it works, but 
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it might take some work to see if something really 
does indeed work. 

COMMENTS ON KASS'S COMMENTS 

Before writing this discussion, I also had the op- 
portunity to read Rob Kass's comments on Efron's 
article. 

I pretty much agree with Kass's points, except 
for his claim that most of Bayes is essentially max- 
imum likelihood estimation. Multilevel modeling is 
only approximately maximum likelihood if you fol- 
low Efron and Morris's empirical Bayesian formula- 
tion in which you average over intermediate parame- 
ters and maximize over hyperparameters, as I gather 
Kass has in mind. But then this makes "maximum 
likelihood" a matter of judgment: what exactly is 
a hyperparameter? Things get tricky with mixture 
models and the like. I guess what I'm saying is that 
maximum likelihood, like many classical methods, 
works pretty well in practice only because practi- 
tioners interpret the methods flexibly and do not do 
the really stupid versions (such as joint maximiza- 
tion of parameters and hyperparameters) that are 
allowed by the theory. 

Regarding the difficulties of combining evidence 
across species (in Kass's discussion of the DuMouchel 
and Harris paper), one point here is that this works 
best when the parameters have a real- world mean- 
ing. This is a point that became clear to me in my 
work in toxicology (Gelman, Bois and Jiang, 1996): 
when you have a model whose parameters have nu- 
merical interpretations ("mean," "scale," 
"curvature," and so forth), it can be hard to get use- 
ful priors for them, but when the parameters have 
substantive interpretations ("blood flow," "equilib- 
rium concentration," etc.), then this opens the door 
for real prior information. And, in a hierarchical con- 
text, "real prior information" does not have to mean 
a specific, pre-assigned prior; rather, it can refer to 
a model in which the parameters have a group-level 
distribution. The more real-worldy the parameters 
are, the more likely this group-level distribution can 
be modeled accurately. And the smaller the group- 
level error, the more partial pooling you will get and 
the more effective your Bayesian inference is. To me, 
this is the real connection between scientific model- 
ing and the mechanics of Bayesian smoothing, and 
Kass alludes to some of this in the final paragraph 
of his comment. 

Hal Stern once said that the big divide in statis- 
tics is not between Bayesians and non-Bayesians but 



rather between modelers and non-modelers. And, 
indeed, in many of my Bayesian applications, the 
big benefit has come from the likelihood. But some- 
times that is because we are careful in deciding what 
part of the model is "the likelihood." Nowadays, this 
is starting to have real practical consequences even 
in Bayesian inference, with methods such as DIC, 
Bayes factors, and posterior predictive checks, all of 
whose definitions depend crucially on how the model 
is partitioned into likelihood, prior, and hyperprior 
distributions. 

On one hand, I am impressed by modern machine- 
learning methods that process huge datasets with I 
agree with Kass's concluding remarks that empha- 
size how important it can be that the statistical 
methods be connected with minimal assumptions; 
on the other hand, I appreciate Kass's concluding 
point that statistical methods are most powerful 
when they are connected to the particular substan- 
tive question being studied. I agree that statistical 
theory is far from settled, and I agree with Kass that 
developments in Bayesian modeling are a promising 
way to move forward. 
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